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Summary. Like mean, quantile and variance, mode is also an important measure of central 
tendency and data summary. Many practical questions often focus on "Which element (gene 
or file or signal) occurs most often or is the most typical among all elements in a network?". 
In such cases mode regression provides a convenient summary of how the regressors affect the 
conditional mode and is totally different from other regression models based on conditional mean 
or conditional quantile or conditional variance. Some inference methods have been used for mode 
regression but none of them from the Bayesian perspective. This paper introduces Bayesian mode 
regression by exploring three different approaches. We start from a parametric Bayesian model by 
employing a likelihood function that is based on a mode uniform distribution. It is shown that 
irrespective of the original distribution of the data, the use of this special uniform distribution 
is a very natural and effective way for Bayesian mode regression. Posterior estimates based on 
this parametric likelihood, even under misspecification, are consistent and asymptotically normal. 
We then develop a nonparametric Bayesian model by using Dirichlet process (DP) mixtures of 
mode uniform distributions and finally we explore Bayesian empirical likelihood mode regression 
by taking empirical likelihood into a Bayesian framework. The paper also demonstrates that a 
variety of improper priors for the unknown model parameters yield a proper joint posterior. The 
proposed approach is illustrated using simulated datasets and a real data set. 
Keywords: Bayesian inference; empirical likelihood; Markov Chain Monte Carlo methods; mode 
regression; nonparametric Bayesian; parametric Bayesian. 
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1 Introduction 



Mode, the most likely value of a distribution, has wide applications in biology, astronomy, eco- 
nomics and finance. For example, it is not uncommon in many fields to encounter data distributions 
that are skewed or contain outliers. In those cases, the arithmetic mean may not be an appropriate 
statistic to represent the center of location of the data. Alternative statistics with less bias are the 
median and the mode. The mean or median of two densities may be identical, while the shapes 
of the two densities are quite different. Mode preserves some of the important features, such as 
wiggles, of the underlying distribution function, whereas the mean or median tend to average out 
the data. Actually, as an important statistic, mode has been used in moder n science to iden- 



tify th e most frequent or the most typical element in certain n etwor k 



(2003), 



Heckman. Geiser. Eidell. Stauffer. Kardos. and Hedges! (|200ll ) , iKumar and Hedged (|l998l ) 



Markov et al 



systems 



Hed ges and S hah 



19971 )). 



A mode estimator is often defined as the maximum of the estimated distribution density, 
typically under nonparametric kernel estimation. Such mode estimat ion has at t racted a lot of 



attent i on in the s t atisti c s literature fo r decades by various author s (lYasu 



(119621) .iGrenanderl (|1965l ). lEddvl f|l98d ^) . iBickel and Fan! (|l996l ). 



and 



Meverl ( 



2001 



Birgel (j 19971 ) 



kawa 



1926), 



Berlinet et al 



Parzcn 



1998) 



among others). Similarly, conditional mode estimatio n is typically carrie d 



out by conditional den sity e stimation via diff e rent nonparame tric methods 



Hall and Huang! |200l|) and 



Hall et al 



(2001 



Gasser et al 



Dunson et al. 



99 



20071 ) 



), iBrunneri (119921 ). Ho (2006) 
among others). However, these nonparametric conditional density based mode regression models 
are not a direct estimation of the conditional mode. The problem with these methods is twofold: 
the estimation of the conditional density may suffer from the well-known "curse of dimensionality" 
and, it is hard to describe and interpret the estimated conditional mo de in terms of predictors 



or co variates. Direct inference for mode regression was explored by 

~3 



Lee first in 1989 and then in 



1993 (ILeei 



198£ 



1993 ). How ever, it has not been well-applied due to lack of proper inference tools. 



Recently, iKemp and Silval (2012) relaxed Lee's restriction on truncated dependent variables and 
employed alternative kernel estimation. However, their regression coefficient estimation has slow 
convergence rate, involves bandwidth selection and provides only approximate normal confidence 
intervals. Moreover, direct Bayesian method for mode regression is not available but there is clear 
practical motivation from this perspective. 

In this paper we introduce a fully Bayesian framework for direct mode regression inference 
by using three approaches: a parametric Bayesian method, a nonparametric Bayesian method 
and an empirical likelihood based Bayesian method. The remainder of the paper is organized as 



2 



follows. Section [2] introduces the three approaches, describes the theoretical and computational 
framework of these methods and gives their mathematical justification. In Section [3] we illustrate 
the proposed methods through two simulated case-studies and a real example. We conclude with 
a short discussion in Section [H 



2 Bayesian mode regression 



Consider an arbitrary random variabl e Z, let 



and let K(Z; •) be the step-loss function (jManskil . Il99ll ) such as, 



Fz(z) be the distribution of Z with density fz(z) 



K(Z;n)=I[¥—^>l], (1) 

with a > and I [A] being the indicator function of event A. If fz( z ) is symmetric about ji or if 
fi is the middle value of the interval of length 2a that captures the most probability under Fz(z) 
then jl = argmin^i?{L(Z; //)} is the mode of Z . Lee (1989) introduced mode regression, or the 
conditional mode of y given x, as mode{y\x) = x'j3 based on the loss function K(y;x'f3), where 
P is the regression parameter. That is, given a sample {(xi,yi), ... (x n ,y n )} from (x,y), when a 
approaches 0, the parameter j3 in the conditional model of y\x is estimated by 

n 

j3 = argming ^ K(yr, a;-/3) (2) 

i=l 

2.1 Parametric Bayesian method 

The conditional mode denoted as mode(y\x) = x' j3 can be reformulated as a standard regression 
model 

V = x'/3 + e (3) 

with zero mode for the error term e. 

Given a sample {(xi,yi), ... (x n ,y n )} from (x,y), note that (3 = argmax^ Ya=i ^WVi ~ x i@\ - 
a]. That is, (3 in ([2]) can be regarded as the maximum likelihood estimates of the "working" 
likelihood function 

n 

1=1 

Therefore, the Bayesian mode regression estimates, denoted as (3 B , can be obtained using the 
posterior distribution of (3, 

Tr(P\y)cxL(y\p)n((3), (5) 
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where tt((3) is the prior distribution of (3. Although a standard conjugate prior distribution is not 
available for the mode regression formulation, MCMC methods may be used for extracting the 
posterior distributions of (3. 



2.2 Consistency and asymptotic normality 

The classical mode regression parameter estimator (3 of 



Lee (1989 



199J) 



1 n 

(3 = argmaxp-^2l[\yi - x[(3\ < a] 



is known to be consistent. According to 



White! (|1982l ). any posterior estimator f3 B from the like- 



lihood function (4) with a flat prior, even if misspecified, is still consistent, in the sense of mini- 
mization of the Kullback-Leibler distance between the true distribution and the parametric family 
to whi c h the approximation belongs or in the spirit of the quasi-maximum likelihood estimator of 



White! (|1982h . 

Further, although Bayesian inference does not require large sample theory we provide evidence 
that the posterior distribution obtained via the proposed Bayesian approach, under certain regu- 
larity conditions, is asymptotically normal when the sample size increases. This is the same as the 

ipse asymptotic normality was derived under a special case 



classical mode regr ession e s timat o r (3 w 
of "M-estimators" (jHuberl \l973 ). \Le4 (|1993ft ). 

In fact, let I((3) = E[{-^ log f(y\(3)} 2 ] be the total Fisher information in the data and define 
I n ((3 B ) = I {(3) & >. . Under certain regularity conditions, Taylor power series expansions of the 

P = H B 

logarithm of the posterior distribution leads to 

log7r(/%) = log7r03 B |y) - ±(J3 - p B ) T I n (P B )((3 - B ), 

hence, 

vr(/3|y) « 7T0 B \y) exp[-i(/3 - p B fl n B )((3 - P B )} 

oc exp[-i(/3 - B ) T I n B )(/3 - P B )], 

which is the kernel of a N p ([3\(3 B , density. This implies that y/n(f3 B —/3) ~ N(0, I~ l {f3 B )), 

where I~ l {(3 B ) is specified below, which has the same form as the asymptotic distribution of "M- 
estimators". On the other hand, the likelihood function associated with Bayesian mode regression 
can also be formulated as 



L((3) 



(2aY 



Y[exp(-I[\ yi - x'^l < a}), 



i=i 
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which is a likelihood function based on the uniform probability density 

e 



2a 



exp(-/[|u - fJ,\< a}), 



for a window parameter a > 0. Then, under a flat prior, from 



we have 



f(y\x, ^<t) = — exp[-I[\y - x'(3\ < a}}, 



log f(y\x, (3,a) = l- log(2a) - I[\y - x T f3\ < a], 
^logf(y\x,(3,a) = -I[\y-x T (3\<a], 



d 



(6) 



I n {(5 B ) = E[{— = E{I[\y - x 1 0\ < a}}^. 

Thus, the asymptotic justification of using the proposed "working" likelihood for parametric Bayesian 
mode regression is fully outlined. 



2.3 The estimation of covariance matrix of classical estimates 



Under the classical approaches of 



Lee 1989 



19931 ) and iKemp and Silval (|2012l ) , the covariance 



matrix, cov{/3} of the classical estimator j3 and its inverse are often required but difficult to es- 
timate or compute numerically, especially under small or moderate samples. A by-product of the 
proposed Bayesian approach is that using the MCMC posterior sample leads to a natural and 
efficient estimation of cov {/3} and other asymptotic quantities of (3. 

In fact, a MCMC scheme constructs a Markov chain whose equilibrium distribution is the joint 
posterior, p(f3\data). After running the Markov chain for a burn- in period, one obtains samples 
from the limiting distribution, provided that the Markov chain has converged. Given that the chain 
has converged, the frequency of appearance of the parameters in the Markov chain represents their 
posterior distribution. An informative full density distribution of the model parameters is readily 
obtained rather than a single point estimate as in the classical approach. 

When a Markov chain, 5, is drawn from the posterior distribution, p{(3\ data): S = {(3^ l \(3^ 2 \ ...,(3^) 
where N is the number of draws after burn-in, a consistent estimate of the inverse of the covariance 
matrix co y{(3} can be obtained by multip lying by N the variance-covariance matrix of this MCMC 



sequence (jChernozhukov and Hongl . 



2003 ). 



2.4 Prior selection and proper posteriors 

In this section we demonstrate that in the absence of any realistic information one could use 
improper uniform prior distributions for all the components of j3 as such a choice yields a proper 
joint posterior and then we address the important issue of specifying a prior for parameter a. 
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Below, we show that if we choose the prior of (3 to be improper uniform, then the resulting 
joint posterior distribution will be proper. 

Theorem 1: If the likelihood function is given by ([4]) and 7r(/3) oc 1, then the posterior distribution 
of (3, ir((3\y), will be a proper distribution. In other words 

< J n(f3\y)df3 < oo, 

or, equivalently, 

< J L(y\(3) tt(/3) d/3 < oo. 



The proof can be found in Appendix A 



In practice one usually assumes that the components of /3 have independent improper uniform 
prior distributions which is a special case of the above theorem. 

Next we address the issue of determining a suitable prior for the parameter a. The aim is to 
manage to determine a value that is neither too small no r too large to pr event underutilization of 



Lcc (1989 



19931 ) suggested some possible 



data with \yi-x\b\ < a or with \yi — xfil > a respectively, 
methods for determining the value of a (including trial and error and bootstrapping methods). 

In this work we apply a Uniform(wi, W2) prior on a, where Wi can be determined using one 
or more of the following three rules-of-thumb, depending on the assumption for the underlying 
distribution. 

• The empirical rule, which states that, given a symmetric distribution, approximately 99.7% of 
the data values fall within three standard deviations (sd) of the mean, therefore, Wi = 3 * sd; 

• Chebyshev's Theorem, which is true for any sample set no matter what the distribution is, 
and states that at least 93.75% of the data values fall within four standard deviations of the 
mean, therefore, w; = 4 * sd: 



Variations of Silverman's plug- in estimate for the bandwidth ([Silverman! . 119861 ). a simple 
formula for Wi that depends on the sample size n and the sample standard deviation sd, given 
by Wi = 1.3643(5n -0 ' 2 [min(sd, /Qi2/1.349)] where IQR is the sample inter quantile range and 
5 = 1.3510 for a uniform kernel. This formula assumes data which is normally distributed 
and uses IQR/ 1.349 as an alternative estimate of W{ that protects against outliers. These 
plug-in estimates for Wi work well in practice, especially for symmetric unimodal densities 
even if the data is not normally distributed. Alternatively, IQR/1.3^9 can be replaced by 
1.4826 * MAD to cover data with large number of outliers. 
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However, the choice of a suitable prior for a can be difficult in practice, therefore with the aim 
of developing a more flexible model, in the following section, we relax the distributional assumption 
on the prior for a using a Dirichlet process prior. This leads to a flexible nonparametric mixture 
model. The method is nonparametric in the sense that we do not assume that the prior belongs to 
any fixed class of distributions. 



2.5 Nonparametric Bayesian method 

In this section, we formulate a nonparametric Bayesian mode regression model to avoid critical 
dependence on the mode uniform distribution assumption thus to address the issue of misspecifi- 
cation that may arise under the parametric Bayesian method. 

A density /(•) on M + is non-inc reasing if and only if there exists a distribution function G such 
that f(x\G) = f a" 1 1[Q <x<(7 ]dG(a) ( Fellerl . Il97ll ). Therefore, any unknown density /(•) (with mode 
9), symmetric or not, can be represented as a scale mixture of symmetric uniform distributions, 
that is 



f(x\0,G) = J ^Ii-^-g^dGia), (7) 

where G is the mixing distribution supported on M + . 

Then a nonparametric Bayesian mode regression model can be expressed in the hierarchical 
form 



2/i|/3,(7i ~ f(yi - Xi(3;(Ti),i = 1 

, iid 



n 



(8) 



(Ji\G ~ Gr, i = 1 • • • n 
G\M,d~ DP(M, G (-,d)) 
f3,M,d~p(p), p (M),p{d), 
where, G is the mixing distribution, with base distribution Gq and concentration parameter M and 



2a I [-a<y i -x' i 



fl <(7 ] is the density of a uniform distribution on (—a, a). 



We take a uniform distribution as the base distribution, Go, uniform prior for M and we choose 
non-informative Normal priors for all the components of f3. 



2.6 Empirical likelihood based Bayesian method 

In addition to parametric and nonparametric likelihood, an empirical likelihood based method 
could be an alternative for Bayesian mode regression. To deriv e an e mpiri cal likelihood for mode 



regression we begin wi th notatio ns and a moment restriction. 



regression estimator of 



Led (|1993l ) generalized the mode 



Led (jl989l ). = argminpE{L(Y — x'(3)}, by using the rectangular kernel 
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L(Y;/jl) = {(a 2 — (Y — fi) 2 )I[\Y — < a]}. Therefore, the moment restriction for the empirical 
likelihood can be obtained by the derivative -^-L(Y; fi) = 2(Y — /j,)I[\Y — fJ,\ < cr]. Let be the 
'derivative' of then the mode, /i, of Y satisfies the moment restriction E(1(jjl)) = 0, where 

l( f i) = (Y-v)I(\Y- fJ ,\<a). 

Under an empirical likelihood for mode regression fi = x'j3, thus for any proposed (3 to estimate 
the true p dimensional /3 via empirical likelihood, we use the vector estimating functions g(X, Y, (3) 
with component gj(X,Y, (3) = l(X,Y, (3) Xj for j = l,..,p. Then, the profile empirical likelihood 
ratio is given by 

n n n 

9K{/3) = max{[[(n P i)\J2Pi9(XuY i ,/3) = 0, Pi >0, ^^ = 1}. 

i=l i=l i=l 

By a standard Lagrange multiplier argument we have 



i=l 

with the weights Pi((3) = — — - 1 „ , where the Lagrange multiplier A satisfies 

n(l+A g(Xi,Yi,fj)) 

^ 1 l + X T g(X i ,Y i ,0) ■ 

According to iQin and Lawless! (119941 ). among others, the existence and uniqueness of A are 
guaranteed when the following two conditions are satisfied: (1) zero belongs the convex hull of 
{g(Xi,Yi, (3), i = l,...,n} and (2) the matrix $^i=i{5(-^i> Yi, (3)g(Xi, Yi, f3)'} is positive definite. 

Under Bayesian inference we consider the empirical likelihood function £H(/3) /n n = Yli=i {Pi(fl)}i 
which can be combined with a prior specification vr(/3) on the parameter j3 to obtain the posterior 
distribution 

ir(fl\data) oc vr(/3) <R(J3). 

2.7 Asymptotic properties of Bayesian empirical likelihood 

Before studying the asymptotic normality of the empirical likelihood based Bayesian mode 
regression parameter estimates, we should provide the consistency of the empirical likelihood esti- 
mator, which is a necessary condition for the asymptotic normality of the posterior. As the criterion 
function g( X, Y, f3) results in non-smoo th estimating equations we employ a similar method to the 



one use by 



Molanes Lopez et al 



(120091 ). among others, to derive our asymptotic results. 
Let = argmaxp9\(f3) be the empirical likelihood estimates in a compact set of parameter 
space which contains the true parameter (3 . Then note that our criteri on function g(X, Y, (3 ) can 



be regarded as a special case of M-estimators as discussed in Chapter 5 of 



Van der Vaart 



19981 ) and 



satisfies the conditions of theorem 5.7 in the book. Under some regular conditions such as uniformly 
continuous and bounded imposed on the marginal distribution of X and conditional distribution of 
Y given X, and assume the matrix E{g(X, Y, (3) g(X, Y, (3)'} > 0, then E{g(X, Y, (3)} is sufficiently 
smooth in a compact set of parameter space which contains /3 , so the consistency condition C3 of 
Molanes Lopez et al. ( 2OO9I ) holds, that is, the consistency of (3's empirical likelihood estimates is 
established. 

The asymptotic normality of the posterior distribution ir((3\data) could be established using the 
fact that the empiric al log-likelihood ratio for (3 is well approximated by certain quadratics in the 



sense of Lemma 6 of 



Molanes Lopez et al 



r„(/3) 



-n 



|2009j) so that, 

n 

^logOL + A^pQ,!^)) 



8=1 



= -i(/3 - (3 )'V l2 Vll- l V l2 {(3 - (3 ) + n- l /\(3 - ^)'V[ 2 V^W n - ± n - l W^V u l W n + o P ( n - 1 ), 
with matrices V n = (E{ 9j (X,Y, (3) g k (X,Y, (3)'}), V 12 = -^E{ 9j (X, Y,/3)}), and vector W n = 

n-^T.Ug^Y^)). 

Then from \ogy{{(3) = nT n ((3) we have the posterior 

it{(3\data) = vr(/3) JH(/3) oc exp{-i(/3 - fr)'I n {p - 0) + O p (l)}, 
where I n = nV{ 2 V 11 1 Vi2 and (3 is the empirical likelihood estimate. 



3 Numerical experiments 

In this section we demonstrate our approach to Bayesian mode regression through two simulated 
and one real examples. The real example is based on the Western Electric Workers (WECO) 
dataset and investigates how the worker's gender, pre-employment test result and education, can 
affect productivity. 

3.1 Simulation example 1 

We consider a simulated data from the model 

Vi = A) + PiXi + £i, (9) 

where Xi ~ iV(0, 1) and i = l,...,n, with n = 50,100,200. We set (3 = (1,2) and consider the 
following three specifications for the model error e: 

• Case 1: the standard normal distribution, ej ~ N(0, 1) - a symmetric error distribution; 
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Case 2: a Fisher's Z distribution, e» ~ l/2logZ with Z ~ Fi^i - a skewed error distribution; 

Case 3: a normal distribution with normally distributed outliers (contaminants) centered 
at twice the distance between the true mode and the 99 th percentile of the original normal 
distribution and account ing for 20% of the total data points, ~ 0.80./V(0, \) + 0.20iV(2.5, \) 



(jHedges and Shah , 



2003 ) - an asymmetric error distribution. 



We fit parametric Bayesian mode regression (labeled PBMR) for all the cases above. Then for 
demonstration and comparison purposes we fit empirical likelihood based Bayesian mode regression 
(labeled ELBMR) for case 2 and nonparametric Bayesian mode regression (labeled NBMR) for case 
3. 

For PBMR and ELBMR, we chose independent improper uniform priors for all the components 
of (3 and we simulated realizations from the posterior distributions by means of a single-component 
Metropolis-Hastings algorithm. Each of the parameters was updated using a random- walk Metropo- 
lis algorithm with a Gaussian proposal density centered at the current state of the chain. The 
variance of the proposal dens ity was determined to provide an acceptance rate close to the optimal 



acceptance rate as defined in 



series plots and the R package boa (ISmith 



Roberts and Rosenthal! (120011 ). Convergence was assessed using time 



20071 ). The estimates are posterior means using 10,000 



iterations of the MCMC sampler (after 10,000 burn-in iterations). 

The estimates for NBMR were obtained by fitting a truncated Dirichlet Process (DP) mix- 
ture model, which leads to a computationally straightforward approximation and can be easily 
implemented in the freely available WinBUGS software. Two parallel chains of equal length with 
different initial values were run for the model. The results were based on 10,000 iterations which 
followed a burn-in period of 40,000 for each chain. 

Table Q] compares the posterior means with the true values of /3o and /3i and also gives standard 
deviations and 95% credible intervals for each of the models considered in this example . 

As expected, the PBRM works well as all the absolute biases for the estimated parameters turn 
out to be in the range [0.01, 0.26]. Furthermore, under both ELBMR and NBRM, the true values 
for both /3q and j3\ are recovered successfully indicating that the methods also work well. However, 
it should be noted that the standard deviations for both parameters are smaller than in the PBMR, 
giving shorter confidence intervals. 

Figure [1] exhibits the empirical samples from the joint posterior distributions of the PBMR 
parameters, which were obtained using the output of the MCMC sampler for the regression param- 
eters (3q and j3\. These samples can be used to obtain a consistent estimator of the covariance or 
correlation structure of the parameter estimators, which is difficult to estimate under the classical 
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Table 1: Simulation Example 1: True parameter values (T.V.) and their posterior means, standard 
deviations (S.D.) and 95% credible intervals 





PBMR 


ELBMR 


NBMR 




Normal 


Skewed 


Asymmetric 


Skewed 


Asymmetric 


n 




Po Pi 


Po Pi 


Po Pi 


Po Pi 


Po Pi 


50 


T.V 
Mean 
S.D. 
35% HPE 


1 2 
0.92 2.00 
0.78 0.77 
(-0.6,2.1X0.5,3.3) 


1 2 
1.07 2.01 
0.78 0.49 
(-0.3,2.6X1.2,3.1) 


1 2 
0.96 2.02 
0.34 0.24 
(0.4,1.7)(1.6,2.5) 


1 2 
1.01 2.00 
0.01 0.01 

(0.99,1.02x1.99,2.01; 


1 2 
1.09 1.94 
0.24 0.19 
(0.7,1.5) (1.5,2.3) 


100 


T.V 
Mean 
S.D. 
35% HPD 


1 2 
1.01 2.10 
0.18 0.25 
(0.6,1.3)(1.6,2.6) 


1 2 
0.95 1.89 
0.52 0.37 
(0.0,1.9)(1.2,2.6) 


1 2 
1.06 1.94 
0.98 0.76 
(-0.7,2.9X0.5,3.3) 


1 2 
1.01 2.00 
0.01 2 
(0.99,1.02X1.99,2.01; 


1 2 
1.06 2.00 
0.14 0.12 
(0.8,1.3) (1.8,2.2) 


200 


T.V 
Mean 
S.D. 
35% HPE 


1 2 
1.26 1.99 
0.86 0.52 
(-0.5,2.8X0.9,3.0) 


1 2 
1.00 1.99 
1.29 0.75 
(-1.3,3.5X0.6,3.3) 


1 2 
1.06 1.96 
0.82 0.42 
(-0.4,2.6X1.2,2.7) 


1 2 
1.01 2.00 
0.01 0.01 

(0.99,1.02x1.99,2.01; 


1 2 
1.04 1.91 
0.07 0.06 
(0.92,1.19X1.78,2.03; 



approach. For example in case (a), with sample size n=100, 
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Figure 1: Plots showing the empirical samples from the joint distributions of mode regression 
parameters 




(a) Symmetric error 



(b) Skewed error 




(c) Asymmetric error 



3.2 Simulation example 2 



In this section we present the results of a second simulation example with the aim of comparing 
the performance of our approach with the classic al mode regression approach. Specifically, we 
replicate the simulation study in iKemp and Silval (2012). but only for a sample of size 250, and 
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compare their results with the results obtained under our Bayesian mode regression approach. 
Simulation data are generated by the simple linear model 

Vi = Pa + + (1 + vxi)ei, (10) 

where Xj are generated from a x^ 3 ) distribution, scaled to have variance 1, and ej are generated as 
independent draws from a re-scaled log-gamma random variable, 

e i = -\ln{Z i ) (11) 

where Z follows a gamma distribution with mean 1 and scale parameter — , to ensure that e% has zero 
mode. Furthermore, we set A = [(1 + 2E(xi)v + E(xf)v 2 )ip(a)] Q to ensure that the unconditional 
variance of the error (1 + vxi) is equal to one. 

The study was performed for a G {0.05,5} and for v G {0,2}. Table [2] compares the 95% 
credible intervals for the estimates obtained under PBMR and NBMR with the 95% confidence 
intervals for the estimates under the two classical mode regression models: Mode 1.6 and Mode 0.8. 
Mode 1.6 and Mode 0.8 correspond to k = 1.6 and k = 0.8 respectively in the bandwidth selection 
rule, bandwidth= k mad n -a 143 , with mad= the median of the absolute deviation from the median 
of ordinary least squares regression residuals. 

Table 2: Simulation Example 2: Comparison between Classical and Bayesian approach for mode 
regression 





PBMR 


NBMR 


Mode 1.6 


Mode 0.8 


a 


n 




95% HPD 


95% HPD 


95% CI 


95% CI 







Po 


(-0.37,0.29) 


(-0.21,0.36) 


(-0.31, 0.41) 


(-0.69, 0.75) 


5.00 


Pi 


(0.82,1.28) 


(0.89,1.32) 


(0.77, 1.24) 


(0.56,1.45) 


2 


/?o 


(-0.06,0.07) 


(-0.03,0.21) 


(-0.15,0.23) 


(-0.25,0.29) 






0i 


(0.99,1.14) 


(0.80,1.22) 


(0.63,1.37) 


(0.48,1.53) 







/So 


(0.00, 0.14) 


(-0.03,0.07) 


(0.12,0.42) 


(-0.09,0.35) 


0.05 


Pi 


(0.95,1.13) 


(0.95,1.06) 


(0.90,1.11) 


(0.87,1.17) 


2 


Po 


(0.02,0.08) 


(0.04,0.09) 


(0.09,0.29) 


(0.01,0.21) 






Pi 


(0.99,1.08) 


(0.97,1.04) 


(0.91,1.19) 


(0.85,1.19) 



The results of the analysis suggest that the Bayesian mode regression estimates are strong 
competitors of the classical mode regression estimates since in almost all the examples both PBMR 
1 ip{-) is the trigamma function 
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and NBMR estimators outperfo rm the two classi cal estimators. 

Finally, as also evident form lKemp and Silval (2012). the selection of the value/prior for a plays 
an important role on the precision of the parameters, an issue that is less restrictive under NBMR. 



3.3 Productivity of Western Electric Workers - WECO 

To illustrate the applicability of our approach we consider a model for predicting the produc- 
tivity of newly hired Electric workers in a manufacturing firm. Productivity (j/j) was modeled as 
a function of a gender indicator (sexi), the score on a physical dexterity exam administrated prior 
to employment (dex{) and the years of education (lexi). 



Ui = A) + Pisexi + fcdexi + /3 3 lexi + ^lexj + (12) 



The data come originally from the study of 



Klein et al 



(|199ll ). but have been modified over 



the years to heighten the pedagogical impact. Figure [2] presents the density plot for productivity 
which is unimodal and almost symmetric (skewness =0.069). 

While the productivity levels range from 10.5 to 19.1, one is interested in how the typical 
productivity level is affected by the model covariates. To estimate this effect we apply our PBMR 
model to estimate the model parameters, /3o 5 Pi, and j3^. The output was obtained by running 

the sampler for 50,000 cycles after a burn-in of 100,000, to ensure convergence and mixing. Table 
[3] summarizes the results. 

The results indicate that on average the mode productivity level of a female worker, who scores 
zero on her physical dexterity exam and has zero years of education is 4.93 units. Furthermore, it 
can be concluded that on average the most frequent productivity level is lower for a male worker, 
while it is higher for workers with a higher exam score. Finally, it is deduced that an additional 
year of education contributes positively to the level of mode productivity. 

Given that under the PBMR a relatively wide credible interval is obtained for the some model 
parameters we also fit a NBMR to the WECO dataset. Again, two parallel chains of equal length 
with different initial values were run for the model. The results were based on 20,000 iterations 
which followed a burn-in period of 50,000 iterations for each chain. As illustrated in Table [31 the 
results obtained under the NBMR are similar to the results obtained under PBRM, but now the 
confidence intervals are much smaller. 
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Figure 2: Density plot for WECO data 
Density Plot - Productivity 




10 12 14 16 18 
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Table 3: Model parameters and their estimated posterior means, standard deviations (S.D.) and 
95% Credible intervals for the WECO data 





PBMR 


NBMR 


Parameter 


Mean 


S.D. 


95% HPD 


Mean 


S.D. 


95% HPD 


ft 


4.93 


8.13 


(-12.6,19.8) 


4.10 


1.11 


(2.56, 6.52) 


ft 


-0.71 


0.46 


(-1.58,0.21) 


-0.84 


0.08 


(-1.03,-0.73) 


ft 


0.12 


0.03 


(0.06,0.18) 


0.12 


0.005 


(0.11,0.12) 


ft 


0.87 


1.27 


(-1.44,3.51) 


1.08 


0.18 


(0.69,1.37) 


ft 


-0.04 


0.05 


(-0.14,0.06 ) 


-0.05 


0.008 


(-0.06,-0.03 ) 



4 Conclusions 

In this paper we introduce a novel Bayesian mode regression framework which includes three 
approaches: a parametric method, a nonparametric method and an empirical likelihood based 
method, as in the area of mode regression, there is no literature from a Bayesian perspective. We 
demonstrate that our estimates are consistent and asymptotically normal under rather standard 
conditions, even under misspecification of the likelihood function. The approaches are easy to 
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implement and have proper inference tools as well as credible intervals available irrespective of the 
sample size. The numerical studies suggest that the proposed Bayesian mode regression estimates 
are strong competitors of the classical mode regression estimates. 

Appendix A 

Proof of theorem 1 

For any a > and m > p, the moments of posterior distribution is given by 

e\WWM = I f{\^^—jz^v[-i[\m-m<cy]]d(3. 

J j=0 ^ ' i=l 

Noting that J2i=i ex P[ — I[\Vi ~ x 'ifl\ < &]] 1S always a constant whether \m — x' { (3\ < a or not 
(i = 1, n). Suppose that the coefficient matrix X = (x\, X2, ■■■,x p ) of mode regression equations 
Hi = xj (3 + €{ is a full rank matrix with rank p, then there is a subset of p constrains \yi — x\(5\ < a 
(i = 1, ...,n) to provide < \/3j\ < Bj < oo (j = 0, 1, ...,p — 1), even if some of \yi — x\fi\ < a are 
true and some are not. Therefore, 

E[\W\a,y] = const. / / ... / J] d(3, 

J—BoJ—Bi J —Bp j_q 

which is finite. 
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